Getting Started
Agentic Browser is a next-generation browser extension powered by a Python MCP (Model Context Protocol) server. It enables an intelligent agent to understand and act on web content, supporting multiple LLM providers and offering a secure, declarative action system for browser automation.
Key goals:
Model-agnostic agent backend using Python, LangChain, and MCP
Secure browser extension using WebExtensions API
Advanced agent workflows with retrieval-augmented generation and multi-step tasks
Strong guardrails and transparency layers
Open-source extensibility
Section sources
The repository is organized into backend and frontend components:
Backend (Python): FastAPI server and MCP server
Frontend (TypeScript/React): Browser extension built with WXT
Shared tools and services for specialized workflows
api/run.py"] B["MCP Server
mcp_server/server.py"] C["Core Config & LLM
core/config.py, core/llm.py"] end subgraph "Frontend" D["WXT Config
extension/wxt.config.ts"] E["Background Script
entrypoints/background.ts"] F["Content Script
entrypoints/content.ts"] G["Agent Utils
entrypoints/utils/*"] end A --> C B --> C D --> E D --> F E --> F G --> A
Diagram sources
Section sources
Before installing Agentic Browser, ensure your environment meets the following requirements:
Python
Version requirement: Python >= 3.12
Package manager: uv (recommended) or pip
Virtual environment recommended
Node.js and Package Manager
Node.js version: managed by the project’s package manager configuration
Package manager: pnpm (recommended) or npm
TypeScript support included in the frontend configuration
Browser Extension Development
Chromium-based browsers (Chrome, Edge, Brave) or Firefox for development
Permissions and manifest configuration defined in the extension manifest
WebExtensions API support for background scripts, content scripts, and side panel
LLM Provider Keys (optional for initial setup)
Supported providers include Google, OpenAI, Anthropic, Ollama, DeepSeek, OpenRouter
API keys or base URLs configured via environment variables or UI
Section sources
Follow these step-by-step instructions to install both backend and frontend components.
Backend (Python)#
Clone the repository and navigate to the project root.
Create and activate a Python virtual environment (recommended).
Install dependencies using uv:
Run: uv pip install -e .
Verify installation by checking installed packages from pyproject.toml.
Notes:
The project uses uv for dependency resolution and installation.
The FastAPI server and MCP server are both supported via the main entry point.
Section sources
Frontend (Extension)#
Navigate to the extension directory.
Install dependencies using pnpm:
Run: pnpm install
Build or develop the extension:
Development: pnpm dev
Production build: pnpm build
Firefox builds: pnpm dev:firefox or pnpm build:firefox
Manifest and permissions:
The extension manifest defines permissions for tabs, storage, scripting, identity, side panel, web navigation, web request, cookies, bookmarks, history, clipboard, notifications, context menus, and downloads.
Host permissions include <all_urls>.
Section sources
Configure environment variables and basic settings before launching the servers.
Environment Configuration#
Backend host and port defaults are configurable via environment variables.
Debug logging level is controlled by environment variables.
Key variables:
BACKEND_HOST: Server host binding (default: 0.0.0.0)
BACKEND_PORT: Server port (default: 5454)
DEBUG: Enable debug logging (default depends on environment)
GOOGLE_API_KEY: Google provider API key (required for Google provider)
OPENAI_API_KEY, ANTHROPIC_API_KEY, OLLAMA_BASE_URL, DEEPSEEK_API_KEY, OPENROUTER_API_KEY: Additional provider keys and base URLs
Note: The backend loads environment variables from a .env file automatically.
Section sources
API Key Setup for LLM Providers#
For Google provider, set GOOGLE_API_KEY.
For OpenAI, Anthropic, DeepSeek, and OpenRouter, set the respective API keys.
For Ollama, configure OLLAMA_BASE_URL if using a custom endpoint.
Keys can be provided directly to the LLM client or via environment variables.
UI-based key management:
The extension includes a UI component for saving API keys locally in the extension storage.
Section sources
Basic Configuration Options#
Backend host/port: Controlled by environment variables.
Debug logging: Controlled by environment variables.
Provider selection and model defaults are defined in the LLM configuration.
Section sources
Launch the MCP server, install the browser extension, and perform basic browser automation tasks.
Launch the MCP Server#
From the project root, run the main entry point with the MCP flag:
Command: python main.py --mcp
Alternatively, use the script alias defined in pyproject.toml:
Command: agentic-mcp
Verification:
The MCP server initializes and exposes tools for LLM generation, GitHub Q&A, and website content conversion.
Section sources
Install the Browser Extension#
Build the extension:
Development: pnpm dev
Production: pnpm build
Load the unpacked extension in your browser:
Chrome/Edge: Load unpacked from the extension build output directory
Firefox: Use the appropriate developer loading mechanism
Permissions:
The extension requests broad permissions for tabs, storage, scripting, identity, side panel, web navigation, web request, cookies, bookmarks, history, clipboard, notifications, context menus, and downloads.
Section sources
Perform Basic Browser Automation Tasks#
Use slash commands in the extension UI to trigger agent workflows.
Example slash commands include:
/browser-action: Execute browser automation tasks (navigate, click, type, scroll)
/react-ask: Chat with the React ReAct agent
/google-search: Perform a quick web search
/gmail-unread: Check unread emails
/calendar-events: View upcoming schedule
/youtube-ask: Q&A with YouTube videos
Execution flow:
The extension parses slash commands and routes them to the backend via HTTP requests.
The background script handles messaging and action execution, including tab/window control and DOM manipulation.
Section sources
Agentic Browser integrates a Python MCP server with a React-based browser extension. The extension communicates with the backend to execute agent workflows and browser actions.
entrypoints/background.ts"] CT["Content Script
entrypoints/content.ts"] UI["Side Panel & UI"] end subgraph "Backend" MCP["MCP Server
mcp_server/server.py"] API["FastAPI Server
api/run.py"] CFG["Config & LLM
core/config.py, core/llm.py"] end UI --> BG BG <- --> API BG <- --> MCP MCP --> CFG API --> CFG CT --> BG
Diagram sources
Backend Servers#
FastAPI Server
Starts the API server with configurable host and port.
Provides endpoints for agent workflows and tools.
MCP Server
Exposes tools for LLM generation, GitHub Q&A, and website content conversion.
Supports multiple providers with dynamic configuration.
Diagram sources
Section sources
LLM Configuration and Provider Support#
Provider configurations define default models, API key environment variables, and base URLs.
The LLM client validates keys and base URLs and raises descriptive errors if missing.
Diagram sources
Section sources
Extension Components#
Background Script
Handles messaging, tab/window control, and action execution.
Injects content scripts and performs DOM manipulation.
Content Script
Provides lightweight page interaction helpers.
Agent Utilities
Parse slash commands and route to backend endpoints.
Capture page context and construct payloads for agent workflows.
Diagram sources
Section sources
Backend dependencies are declared in pyproject.toml and include FastAPI, Uvicorn, LangChain, LangGraph, MCP, and others.
Frontend dependencies are declared in extension/package.json and include React, WXT, socket.io-client, and UI libraries.
Diagram sources
Section sources
Use uv for faster dependency resolution and installation compared to pip.
Prefer production builds for the extension to minimize bundle size.
Minimize repeated DOM queries and injections; batch actions when possible.
Configure logging appropriately (DEBUG vs INFO) to reduce overhead during production runs.
[No sources needed since this section provides general guidance]
Common setup and runtime issues:
Missing Python version
Ensure Python >= 3.12 is installed and selected in your environment.
Missing Node.js or pnpm
Install Node.js and pnpm; rebuild the extension after installation.
Backend server startup
Use the main entry point with the MCP flag to start the MCP server.
Verify host and port settings via environment variables.
LLM provider configuration errors
Ensure required API keys or base URLs are set for the chosen provider.
Check for typos in environment variable names.
Extension not loading
Confirm permissions in the manifest and load the extension as unpacked.
Check browser developer tools for errors.
Action execution failures
Verify that the active tab is reachable and not blocked by CORS or privacy restrictions.
Review background script logs for detailed error messages.
Section sources
Agentic Browser combines a powerful Python MCP server with a modern React-based browser extension to deliver model-agnostic, secure, and extensible web automation. By following the prerequisites, installation steps, and initial setup guide, you can quickly launch the backend servers, install the extension, and start performing automated browser tasks using slash commands.
[No sources needed since this section summarizes without analyzing specific files]
Appendix A: Environment Variables Reference#
BACKEND_HOST: Backend host binding (default: 0.0.0.0)
BACKEND_PORT: Backend port (default: 5454)
DEBUG: Enable debug logging
GOOGLE_API_KEY: Google provider API key
OPENAI_API_KEY: OpenAI provider API key
ANTHROPIC_API_KEY: Anthropic provider API key
OLLAMA_BASE_URL: Ollama base URL
DEEPSEEK_API_KEY: DeepSeek provider API key
OPENROUTER_API_KEY: OpenRouter provider API key
Section sources
Appendix B: Example Slash Commands#
/browser-action: Execute browser automation tasks (navigate, click, type, scroll)
/react-ask: Chat with the React ReAct agent
/google-search: Perform a quick web search
/gmail-unread: Check unread emails
/calendar-events: View upcoming schedule
/youtube-ask: Q&A with YouTube videos
Section sources